audio deepfake
New AI technique sounding out audio deepfakes
Researchers from Australia's national science agency CSIRO, Federation University Australia and RMIT University have developed a method to improve the detection of audio deepfakes. The new technique, Rehearsal with Auxiliary-Informed Sampling (RAIS), is designed for audio deepfake detection -- a growing threat in cybercrime risks such as bypassing voice-based biometric authentication systems, impersonation and disinformation. It determines whether an audio clip is real or artificially generated (a'deepfake') and maintains performance over time as attack types evolve. In Italy earlier this year, an AI-cloned voice of its Defence Minister requested a €1M'ransom' from prominent business leaders, convincing some to pay. This is just one of many examples, highlighting the need for audio deepfake detectors.
The Download: combating audio deepfakes, and AI in the classroom
The news: A new technique known as "machine unlearning" could be used to teach AI models to forget specific voices. How it works: Currently, companies tend to deal with this issue by checking whether the prompts or the AI's responses contain disallowed material. Machine unlearning instead asks whether an AI can be made to forget a piece of information that the company doesn't want it to know. It works by taking a model and the specific data to be redacted then using them to create a new model--essentially, a version of the original that never learned that piece of data. Why it matters: This could be an important step in stopping the rise of audio deepfakes, where someone's voice is copied to carry out fraud or scams.
- Information Technology > Security & Privacy (0.63)
- Education > Educational Setting (0.47)
How to spot a deepfake: the maker of a detection tool shares the key giveaways
Sometimes there are no background noises when there should be. Or, in the case of the robocall, there's a lot of noise mixed into the background almost to give an air of realness that actually sounds unnatural. With photos, it helps to zoom in and examine closely for any "inconsistencies with the physical world or human pathology", like buildings with crooked lines or hands with six fingers, Lyu said. Little details like hair, mouths and shadows can hold clues to whether something is real. Hands were once a clearer tell for AI-generated images because they would more frequently end up with extra appendages, though the technology has improved and that's becoming less common, Lyu said.
- North America > United States > New Hampshire (0.05)
- Europe > Russia (0.05)
- Asia > Russia (0.05)
- Information Technology > Security & Privacy (1.00)
- Government (1.00)
CLAD: Robust Audio Deepfake Detection Against Manipulation Attacks with Contrastive Learning
Wu, Haolin, Chen, Jing, Du, Ruiying, Wu, Cong, He, Kun, Shang, Xingcan, Ren, Hao, Xu, Guowen
The increasing prevalence of audio deepfakes poses significant security threats, necessitating robust detection methods. While existing detection systems exhibit promise, their robustness against malicious audio manipulations remains underexplored. To bridge the gap, we undertake the first comprehensive study of the susceptibility of the most widely adopted audio deepfake detectors to manipulation attacks. Surprisingly, even manipulations like volume control can significantly bypass detection without affecting human perception. To address this, we propose CLAD (Contrastive Learning-based Audio deepfake Detector) to enhance the robustness against manipulation attacks. The key idea is to incorporate contrastive learning to minimize the variations introduced by manipulations, therefore enhancing detection robustness. Additionally, we incorporate a length loss, aiming to improve the detection accuracy by clustering real audios more closely in the feature space. We comprehensively evaluated the most widely adopted audio deepfake detection models and our proposed CLAD against various manipulation attacks. The detection models exhibited vulnerabilities, with FAR rising to 36.69%, 31.23%, and 51.28% under volume control, fading, and noise injection, respectively. CLAD enhanced robustness, reducing the FAR to 0.81% under noise injection and consistently maintaining an FAR below 1.63% across all tests. Our source code and documentation are available in the artifact repository (https://github.com/CLAD23/CLAD).
- Asia > China > Hubei Province > Wuhan (0.05)
- North America > United States > Washington > King County > Seattle (0.04)
- Asia > China > Hong Kong (0.04)
- (9 more...)
4 Ways AI Transformed Music, Movies and Art in 2023
Artificial intelligence began to reshape music, movies and art in 2023, sparking both enthusiasm and panic. Some artists used AI to aid their creative practices. Others took legal action against the companies that co-opted art to make their models more powerful. As battles played out across picket lines and courtrooms, millions of viewers and listeners around the world tuned into AI-created content with curiosity, disdain and glee. Here are the major ways AI impacted culture this year.
- Leisure & Entertainment (1.00)
- Law > Litigation (0.98)
- Media > Music (0.72)
- Government > Regional Government > North America Government > United States Government (0.72)
MFAAN: Unveiling Audio Deepfakes with a Multi-Feature Authenticity Network
Krishnan, Karthik Sivarama, Krishnan, Koushik Sivarama
In the contemporary digital age, the proliferation of deepfakes presents a formidable challenge to the sanctity of information dissemination. Audio deepfakes, in particular, can be deceptively realistic, posing significant risks in misinformation campaigns. To address this threat, we introduce the Multi-Feature Audio Authenticity Network (MFAAN), an advanced architecture tailored for the detection of fabricated audio content. MFAAN incorporates multiple parallel paths designed to harness the strengths of different audio representations, including Mel-frequency cepstral coefficients (MFCC), linear-frequency cepstral coefficients (LFCC), and Chroma Short Time Fourier Transform (Chroma-STFT). By synergistically fusing these features, MFAAN achieves a nuanced understanding of audio content, facilitating robust differentiation between genuine and manipulated recordings. Preliminary evaluations of MFAAN on two benchmark datasets, 'In-the-Wild' Audio Deepfake Data and The Fake-or-Real Dataset, demonstrate its superior performance, achieving accuracies of 98.93% and 94.47% respectively. Such results not only underscore the efficacy of MFAAN but also highlight its potential as a pivotal tool in the ongoing battle against deepfake audio content.
Researchers warn 'humans cannot reliably detect' audio deepfakes even when trained
Senior correspondent Alicia Acuna reports the latest from Denver. AI-generated audio that mimics humans can be so convincing that people can't tell the difference a quarter of the time – even when they're trained to identify faked voices, a new study claims. Researchers at University College London investigated how accurately humans can differentiate between AI-generated audio and organic audio, according to a report in the science and medical journal Plos One. The study comes amid the rise of deepfakes, videos and pictures that can be edited to appear as if they are actual images of other people. "Previous literature has highlighted deepfakes as one of the biggest security threats arising from progress in artificial intelligence due to their potential for misuse," researchers wrote in their paper published this month.
- North America > United States (0.16)
- Europe > Germany (0.05)
Deepfake audio has a tell
An office worker answers it and hears his boss, in a panic, tell him that she forgot to transfer money to the new contractor before she left for the day and needs him to do it. She gives him the wire transfer information, and with the money transferred, the crisis has been averted. The worker sits back in his chair, takes a deep breath, and watches as his boss walks in the door. The voice on the other end of the call was not his boss. The voice he heard was that of an audio deepfake, a machine-generated audio sample designed to sound exactly like his boss.
La veille de la cybersécurité
With deepfake audio, that familiar voice on the other end of the line might not even be human let alone the person you think it is. An office worker answers it and hears his boss, in a panic, tell him that she forgot to transfer money to the new contractor before she left for the day and needs him to do it. She gives him the wire transfer information, and with the money transferred, the crisis has been averted. The worker sits back in his chair, takes a deep breath, and watches as his boss walks in the door. The voice on the other end of the call was not his boss.
Researchers reveal how they detect deepfake audio – here's how
An office worker answers it and hears his boss, in a panic, tell him that she forgot to transfer money to the new contractor before she left for the day and needs him to do it. She gives him the wire transfer information, and with the money transferred, the crisis has been averted. The worker sits back in his chair, takes a deep breath, and watches as his boss walks in the door. The voice on the other end of the call was not his boss. The voice he heard was that of an audio deepfake, a machine-generated audio sample designed to sound exactly like his boss.